🕸️ Ada Research Browser

README.md
← Back

Incident Response Playbook | By Petronella Technology Group

A comprehensive incident response plan template with playbooks, communication plans, forensics checklists, and automation scripts for cybersecurity teams.

License: MIT Petronella Technology Group


Table of Contents


Why You Need an Incident Response Plan

The average cost of a data breach in 2025 reached $4.88 million globally (IBM Cost of a Data Breach Report). Organizations with a tested incident response plan reduced breach costs by an average of $2.66 million compared to those without one. Despite this, over 77% of organizations still lack a formal, consistently applied incident response plan.

An incident response plan is not optional -- it is a business requirement. Every compliance framework, from HIPAA to CMMC to SOC 2, mandates documented incident response procedures. Beyond compliance, a well-executed response can mean the difference between a contained security event and a catastrophic breach that destroys customer trust, triggers regulatory penalties, and results in litigation.

This playbook provides everything you need to build, test, and maintain an incident response capability, whether you are a small business creating your first plan or an enterprise refining existing procedures.

Key Benefits of a Formal IR Plan


The NIST Incident Response Framework

This playbook follows the NIST SP 800-61 Rev. 2 (Computer Security Incident Handling Guide) framework, which defines four phases of incident response. This framework is the industry standard and is referenced by virtually every compliance framework.

+------------------+     +----------------------+     +----------------------------------+     +---------------------+
|   Preparation    | --> | Detection & Analysis | --> | Containment, Eradication &       | --> | Post-Incident       |
|                  |     |                      |     | Recovery                         |     | Activity            |
+------------------+     +----------------------+     +----------------------------------+     +---------------------+
        ^                                                                                              |
        |______________________________________________________________________________________________|
                                            (Lessons Learned Feed Back)

Each phase builds on the previous one, and lessons learned from post-incident review feed back into preparation, creating a continuous improvement cycle.


Phase 1: Preparation

Preparation is the most critical phase. When an incident occurs, there is no time to build capabilities from scratch. Everything must be in place before the first alert fires.

1.1 Build the Incident Response Team (IRT)

Your IRT should include representatives from every function that would be involved in responding to a security incident:

Role Responsibilities Typical Staff
IR Manager Overall incident coordination, executive communication CISO, IT Director
Lead Analyst Technical investigation, evidence collection, forensics Senior Security Engineer
Network Analyst Network containment, traffic analysis, firewall rules Network Engineer
Systems Analyst Endpoint containment, malware analysis, system recovery Systems Administrator
Communications Lead Internal/external messaging, media relations PR/Communications Director
Legal Counsel Regulatory notification, evidence preservation, liability General Counsel or outside firm
HR Representative Insider threat cases, employee communications HR Director
Executive Sponsor Resource allocation, business decisions, board communication CEO, COO, or CTO

1.2 Essential Tools and Resources

Every IR team needs the following tools pre-deployed and tested:

Detection and Monitoring: - SIEM (Security Information and Event Management) -- Splunk, Microsoft Sentinel, Elastic SIEM - EDR (Endpoint Detection and Response) -- CrowdStrike, SentinelOne, Microsoft Defender for Endpoint - NDR (Network Detection and Response) -- Darktrace, Vectra, ExtraHop - Email security gateway with sandboxing capabilities

Forensics and Analysis: - Forensic imaging tools (FTK Imager, dd, CAINE) - Memory analysis tools (Volatility, Rekall) - Network capture tools (Wireshark, tcpdump, NetworkMiner) - Malware analysis sandbox (Any.Run, Joe Sandbox, Cuckoo) - Log aggregation and analysis platform

Communication and Coordination: - Out-of-band communication channel (not dependent on compromised infrastructure) - Incident tracking system (separate from production ticketing) - Secure file sharing for evidence and documentation - Contact lists for all stakeholders (printed copies as backup)

Documentation: - Incident response plan (this document) - Network diagrams and asset inventory - Baseline configurations for critical systems - Escalation procedures and contact trees - Regulatory notification requirements by jurisdiction

1.3 Training and Exercises

A plan that has never been tested is not a plan -- it is a wish. Conduct the following exercises regularly:

Exercise Type Frequency Participants Duration
Tabletop exercise Quarterly Full IRT + executives 2-4 hours
Technical drill Monthly Technical IRT members 1-2 hours
Full simulation Annually All stakeholders Full day
Purple team exercise Semi-annually Red + blue teams 2-5 days

Document all exercises, including identified gaps, and update the IR plan accordingly.


Phase 2: Detection and Analysis

Detection is where most organizations struggle. The global median dwell time (time between compromise and detection) remains over 200 days. Reducing this window is critical.

2.1 Common Attack Vectors

Understanding how incidents begin helps prioritize detection capabilities:

Attack Vector Percentage of Breaches Key Detection Methods
Phishing/Social Engineering 36% Email gateway alerts, user reports, impossible travel
Stolen/Compromised Credentials 29% Impossible travel, anomalous access patterns, dark web monitoring
Vulnerability Exploitation 18% IDS/IPS alerts, WAF logs, vulnerability scanning
Insider Threat 11% DLP alerts, unusual data access, behavioral analytics
Supply Chain Compromise 6% Software integrity monitoring, vendor risk alerts

2.2 Initial Triage Process

When a potential incident is detected, perform initial triage to determine if it is a true incident and assess its severity:

Alert Received
    |
    v
Is this a known false positive?
    |-- Yes --> Document and tune detection rule
    |-- No  --> Continue triage
         |
         v
    Can you confirm malicious activity?
         |-- Yes --> Classify severity (see Section below)
         |-- No  --> Gather more data (logs, network captures, endpoint telemetry)
              |
              v
         Still inconclusive after 30 minutes?
              |-- Yes --> Escalate to Severity 3 (potential incident)
              |-- No  --> Document findings and close

2.3 Evidence Collection During Analysis

During the analysis phase, begin collecting evidence immediately. Follow these principles:

  1. Preserve volatile evidence first -- Memory dumps before disk images, network connections before system shutdown
  2. Document everything -- Every action taken, every finding, every decision with timestamps
  3. Maintain chain of custody -- Use evidence collection forms, hash all evidence files, store securely
  4. Work from copies -- Never analyze original evidence; always work from forensic copies
  5. Use the triage script -- Run scripts/incident-triage.sh on affected systems to collect initial forensic data

2.4 Indicators of Compromise (IOC) Management

Track and share IOCs throughout the investigation:

Use STIX/TAXII formats for IOC sharing with industry partners and ISACs.


Phase 3: Containment, Eradication, and Recovery

3.1 Containment Strategies

Containment must balance stopping the attack with preserving evidence and maintaining business operations.

Short-Term Containment (first hours): - Isolate affected systems from the network (do not power off -- preserve memory) - Block known malicious IPs, domains, and hashes at the perimeter - Disable compromised user accounts - Implement additional monitoring on potentially affected systems - Activate out-of-band communication channels for the IR team

Long-Term Containment (days to weeks): - Move affected systems to an isolated VLAN for analysis - Deploy additional security controls around critical assets - Implement enhanced authentication (MFA enforcement, password resets) - Increase logging verbosity on all systems in the environment - Engage external IR support if needed

3.2 Eradication

Once containment is stable, eliminate the threat:

  1. Identify all compromised systems -- Use IOCs from investigation to scan the entire environment
  2. Remove malware and attacker tools -- Clean or reimage affected systems
  3. Close attack vectors -- Patch exploited vulnerabilities, revoke compromised credentials
  4. Eliminate persistence mechanisms -- Check scheduled tasks, startup items, registry run keys, cron jobs, web shells, implanted SSH keys
  5. Verify eradication -- Re-scan environment with updated IOCs, monitor for recurrence

3.3 Recovery

Recovery must be methodical to avoid reintroducing the threat:

  1. Restore from known-good backups -- Verify backup integrity before restoration
  2. Rebuild rather than clean when possible -- Reimaging provides higher confidence than malware removal
  3. Restore in phases -- Start with critical systems, monitor each before proceeding
  4. Validate functionality -- Test all restored systems before returning to production
  5. Monitor aggressively -- Enhanced monitoring for at least 30 days post-recovery
  6. Document the recovery timeline -- Track every system restored, when, and by whom

Recovery Priority Matrix

Priority System Type Target Recovery Time Examples
P1 - Critical Core business operations 4 hours Domain controllers, email, ERP, patient care systems
P2 - Important Supporting business functions 24 hours File servers, databases, VPN, CRM
P3 - Standard Normal operations 72 hours Print servers, development environments, internal tools
P4 - Low Non-essential 1 week Test environments, archive systems

Phase 4: Post-Incident Activity

The post-incident phase is where organizations build lasting resilience. Skip this phase, and you are guaranteed to repeat the same mistakes.

4.1 Post-Incident Review (Lessons Learned)

Conduct a formal post-incident review within 5 business days of incident closure. Use the template at templates/post-incident-review.md.

Key questions to answer: - What happened, and what was the root cause? - How was the incident detected, and could we have detected it sooner? - Were our containment actions effective? What would we do differently? - Were our communication procedures adequate? - What gaps in tools, processes, or training were identified? - What specific improvements will we implement, and by when?

4.2 Metrics to Track

Metric Target Why It Matters
Mean Time to Detect (MTTD) < 24 hours Measures detection capability
Mean Time to Respond (MTTR) < 4 hours Measures response efficiency
Mean Time to Contain (MTTC) < 8 hours Measures containment effectiveness
Mean Time to Recover < 72 hours Measures recovery capability
False Positive Rate < 10% Measures detection accuracy
Incidents per Month Trending down Measures overall security posture

4.3 Evidence Retention

Retain all incident evidence according to the following schedule:

Evidence Type Retention Period Reason
Forensic images 7 years Legal/regulatory requirements
Log files 3 years Compliance requirements
Incident reports 7 years Legal/regulatory requirements
Communication records 3 years Compliance requirements
IOC data Indefinitely Ongoing threat intelligence

Incident Classification and Severity Levels

Severity Definitions

Severity Name Description Response Time Notification
SEV-1 Critical Active data breach, ransomware, business-critical systems compromised Immediate (< 15 min) Executive team, legal, potentially regulators
SEV-2 High Confirmed compromise of systems containing sensitive data, active attacker in environment < 1 hour IR Manager, CISO, system owners
SEV-3 Medium Suspicious activity confirmed, limited scope, no confirmed data access < 4 hours IR team, affected system owners
SEV-4 Low Security event requiring investigation, no confirmed malicious activity < 24 hours On-call analyst

Incident Categories


Roles and Responsibilities

RACI Matrix for Incident Response

Activity IR Manager Lead Analyst Comms Lead Legal Executive
Initial triage A R I I -
Severity classification R C I I I
Technical investigation A R - I I
Containment decisions R C I C I
External communications A - R C I
Regulatory notifications A - C R I
Recovery authorization C - - C R
Post-incident review R C C C I

R = Responsible, A = Accountable, C = Consulted, I = Informed

On-Call Rotation

Maintain a 24/7 on-call rotation for incident response. Ensure: - Primary and secondary on-call analysts are assigned weekly - On-call contact information is updated monthly - Escalation paths are clearly defined and tested - On-call analysts have remote access to all necessary tools - Handoff procedures are documented for shift changes during active incidents


Communication Plan

Effective incident communication prevents confusion, protects legal interests, and maintains stakeholder trust. See templates/communication-plan.md for the full template.

Internal Communication Guidelines

External Notification Requirements

Stakeholder When to Notify Who Notifies Timeline
Regulators (HIPAA/HHS) PHI breach affecting 500+ individuals Legal + Privacy Officer 60 days
State AG PII breach per state law Legal Varies (24hrs - 60 days)
Law enforcement Criminal activity, significant breach Legal + IR Manager Within 24-72 hours
Cyber insurance Any covered incident Legal + Risk Management Per policy (typically 72 hours)
Affected individuals PII or PHI breach Communications + Legal Per applicable law
Business partners Breach affecting shared data Account Management + Legal Per contract
CISA Critical infrastructure incidents IR Manager + Legal 72 hours

Digital Forensics Fundamentals

Order of Volatility

When collecting evidence, always start with the most volatile data:

  1. CPU registers, cache (nanoseconds)
  2. Memory (RAM) (nanoseconds)
  3. Network state (connections, ARP cache, routing tables)
  4. Running processes (process list, open files, loaded modules)
  5. Disk (file system, swap space, raw sectors)
  6. Remote logging data (SIEM, syslog, cloud logs)
  7. Archival media (backups, offline storage)

Evidence Collection Best Practices

See templates/forensics-checklist.md for the complete forensics collection checklist.


Incident Response for Compliance

Compliance Framework IR Requirements

Framework IR Requirement Key Controls
HIPAA 45 CFR 164.308(a)(6) -- Security Incident Procedures Incident identification, response, mitigation, documentation, breach notification
CMMC Level 2 IR.L2-3.6.1 through IR.L2-3.6.3 Incident handling capability, tracking/documenting/reporting, testing IR capability
SOC 2 CC7.3 -- CC7.5 Detection, response, communication of system incidents
PCI DSS Requirement 12.10 IR plan, testing, personnel assignment, incident response training
NIST CSF RS.RP, RS.CO, RS.AN, RS.MI, RS.IM Response planning, communications, analysis, mitigation, improvements
ISO 27001 A.16 -- Information Security Incident Management Reporting, assessment, response, learning from incidents

Audit Documentation

Maintain the following for compliance audits: - Current incident response plan (reviewed annually at minimum) - Records of all incidents and responses (per retention schedule) - Evidence of regular testing (tabletop exercises, simulations) - Training records for IR team members - Post-incident review reports with improvement tracking - Third-party IR retainer agreements (if applicable)


Templates and Resources

This repository includes the following templates:

Template Purpose Location
Incident Response Plan Complete IR plan template templates/incident-response-plan.md
Communication Plan Internal/external communication procedures templates/communication-plan.md
Forensics Checklist Step-by-step evidence collection guide templates/forensics-checklist.md
Post-Incident Review Lessons learned meeting template templates/post-incident-review.md
Incident Triage Script Automated system data collection for forensics scripts/incident-triage.sh

About Petronella Technology Group

Petronella Technology Group has been a trusted cybersecurity and IT services provider for over 23 years. Founded by Craig Petronella, a 15x published author on cybersecurity and compliance, PTG helps organizations of all sizes build resilient security programs, achieve compliance, and respond effectively to cyber threats.

Why Work with PTG for Incident Response?

Get Expert Help

If you are currently experiencing a security incident, or want to build a comprehensive incident response program, contact our team:

Additional Resources


Contributing

We welcome contributions from the cybersecurity community. Please submit pull requests with improvements to templates, additional playbooks for specific incident types, or corrections to existing content.

License

This project is licensed under the MIT License -- see the LICENSE file for details.


Built with real-world incident response experience by Petronella Technology Group -- Securing businesses for over 23 years.